Education For Employment (EFE) is the leading nonprofit that trains youth and links them to jobs across the Middle East and North Africa (MENA). This pivotal region is the hardest place on the planet for youth to get their first job – they are three times more likely to be unemployed than older adults.
EFE is interested in the effectiveness of their programs, particularly whether graduates find stable employment. We have data on about 7,000 participants in almost 500 program cohorts spread across 8 countries. Participants bring diverse skills, interests, and backgrounds. Programs employ a variety of training models and placement policies. How well are different programs working, and for whom?
EFE has an Salesforce database that houses all information about the organization’s programs, participants, and job placement and retention outcomes. The datasets used in this project include:
Initially, the data exports from Salesforce were pre-processed in the following ways:
The preprocessing script can be viewed on GitHub so that the deidentification steps can be reproduced with new data exports.
The contact dataset contains 8 columns that relate to when each participant obtained employment. These can be collapsed into a single column that gives the time it took for the participant to get placed, or that they were not placed or could not be reached. Below, the new composite column is on the right, and the original job placement columns can be removed.
Job retention at 6 months is the initial outcome variable in the analysis. Therefore, if the participant can not be reached at 6 months after job placement, they are filtered out. Participants that graduated less than 6 months before the data was pulled, and participants that got a job more than 30 days before graduating are also filtered out. After these steps, of the original 7124 participants in the data, only 2652 remain in the dataset that will be included in the analysis.
Overall, of the 2652 participants there is data for, 47.6% had retained employment 6 months after being placed in a job.
The analysis is initially interested in retention at 6 months; therefore the employment status check data will be filtered to contain only the 6 month surveys, and those survey responses can be joined to the participant contact information retaining a 1:1 relationship.
The pre and post training surveys contain questions around confidence and self-efficacy that EFE is interested in looking into. There are five different questions relating to confidence, with answers on a “not at all confident” to “very confident” scale. These answers are turned into numbers so that a composite index can be created, and so that changes in confidence after participants have been through the training programs can be more easily measured.
793 of 2652 participants that remain in the filtered dataset have pre/post survey data. A sample of the confidence and self-efficacy change scores calculated are show below. These are each then joined to the contacts dataset.
The plots below utilize the composite confidence and self-efficacy scores to show overall changes between pre and post surveys.These plots include participants that had pre and post survey data, not only the ones that have pre and post survey data and 6 month job retention data.
##### Do the confidence and self-eficacy scores correlate with
retention
A categorical variable was calculated to identify those who had a job at one point, and then lost it. This was calculated by looking at each of the four time intervals (3 months, 6 months, 9 months, and 12 months), and taking all participants with a retention at that stage of “NO,” who also had a placement at any of the preceding time intervals.
Using the categorization calculated above for who lost their jobs at any point, the resons for leaving were looked at. Of 1165 participants who met this criteria, 0 particpants had at least one answer in an employment status check survey where they gave a reason for leaving their first job. If there were answers given in multiple employment status checks, the most recent answer was used. A sample of how this was done is below, where “ReasonsforLeaving” is the combined field using the most recent response. Responses were also cleaned to remove stray commas and nested answers.
The frequency of reasons given for leaving the first job is below, with NA and Other being the most frequent, and “Position is Temporary” and “Salary” being the top non-other responses for participants where there is data.
A series of models will be tested to determine whether there are features that are important in whether or not participants retained the job the were placed in after 6 months.